Discretization and Grouping: Preprocessing Steps for Data Mining

نویسندگان

  • Petr Berka
  • Ivan Bruha
چکیده

Unlike on-line discretization performed by a number of machine learning (ML) algorithms for building decision trees or decision rules, we propose off-line algorithms for discretizing numerical attributes and grouping values of nominal attributes. The number of resulting intervals obtained by discretization depends only on the data; the number of groups corresponds to the number of classes. Since both discretization and grouping is done with respect to the goal classes, the algorithms are suitable only for classification/prediction tasks. As a side effect of the off-line processing, the number of objects in the datasets and number of attributes may be reduced. It should be also mentioned that although the original idea of the discretization procedure is proposed to the Kex system, the algorithms show good performance together with other machine learning algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Evolutionary Multi-objective Discretization based on Normalized Cut

Learning models and related results depend on the quality of the input data. If raw data is not properly cleaned and structured, the results are tending to be incorrect. Therefore, discretization as one of the preprocessing techniques plays an important role in learning processes. The most important challenge in the discretization process is to reduce the number of features’ values. This operat...

متن کامل

Implementation of Preprocessing Techniques in Datamining

carefully screened can produce misleading results. Thus, the raw data needs to pre-process before doing data mining. And often-times, this step can take considerable amount of processing time. Usually, data from experiments are not suitable for doing data mining tasks. Because of the raw data may contain out-ofrange-values, impossible data combination or missing value etc. Analyzing data withou...

متن کامل

Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining

This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...

متن کامل

Discretization of Numerical Attributes Preprocessing for Machine Learning

Page 2 of 46 Abstract The area of Knowledge discovery and Data mining is growing rapidly. A large number of methods is employed to mine knowledge. Several of the methods rely of discrete data. However, most datasets used in real application have attributes with continuously values. To make the data mining techniques useful for such datasets, discretization is performed as a preprocessing step o...

متن کامل

Result Comparison of Two Rough Set Based Discretization Algorithms

The area of knowledge discovery and data mining is growing rapidly. A large number of methods are employed to mine knowledge. Many of the methods rely of discrete data. However, most of the datasets used in real application have attributes with continuous values. To make the data mining techniques useful for such datasets, discretization is performed as a preprocessing step of the data mining. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998